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Abstract — Consider an n dimensional linear system where it 
is known that there are at most k < n non-zero components in the 
initial state. The observability problem, that is the recovery of the 
initial state, for such a system is considered. We obtain sufficient 
conditions on the number of the available observations to be 
able to recover the initial state exactly for such a system. Both 
deterministic and stochastic setups are considered for system 
dynamics. In the former setting, the system matrices are known 
deterministically, whereas in the latter setting, all of the matrices 
are picked from a randomized class of matrices. The main 
message is that, one does not need to obtain full n observations to 
be able to uniquely identify the initial state of the linear system, 
even when the observations are picked randomly, when the initial 
condition is known to be sparse. Q 



I. Introduction 

A linear system of dimension n is said to be observable if an 
ensemble of at most n successive observations guarantee the 
recovery of the initial state. Observability is an essential notion 
in control theory as, with the sister notion of controllability, 
these form the essence of modem linear control theory. 

In this paper, we consider the observability problem when 
the number of non-zeros in the initial state in a linear system 
is strictly less than the dimension of the system. This might 
arise in systems where natural or external forces give rise to a 
certain subset of components of a linear system to be activated 
or excited, for example an external force may give rise to a 
subset of locally unstable states while keeping certain other 
states intact. 

Furthermore, with the increasing emphasis on networked 
control systems, it has been realized that the controllability 
and observability concepts for linear systems with controllers 
having full access to sensory information is not practical. 
Many research efforts have focused on both stochastic set- 
tings, as well as information theoretic settings to adapt the 
observability notion to control of linear systems with limited 
information. One direction in this general field is the case 
when the observations available at a controller comes at 
random intervals. In this context, in both the information 
theory literature as well as automatic control literature, a 
rich collection of papers have studied the recursive estimation 
problem and its applications in remote control ||T], (|2], (|3], 
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In the following, we describe the system model. In Section 
Hm preliminaries on compressive sensing theory are presented. 
It follows a formal discussion of observability of linear sys- 
tems: since the analytical tools and results are significantly 
different for different cases, we first treat a deterministic setup 
in Section |IV] and then study a stochastic setup in Section [V] 
Detailed proofs are given in Section |Vl] Concluding remarks 
are discussed in Section \VU\ 



II. Problem Formulation 

For the purpose of observability analysis, we consider the 
following discrete-time linear time-invariant system (with zero 
control input): Xt+i = Axt, yt = rjtCxt, where t G Z+ 
denotes the discrete time instant, Xt G M" and yt G M''" 
are the state of the system and the observation of the system 
respectively, the matrices A G M"^" and C G R'^*^" 
denote the state transfer matrix and the observation matrix 
respectively, and rjt takes value either or 1 (?7f = 1 means 
an observation at time t is available, and ij ~ otherwise). 

The problem we are interested in is the observability of 
a system with a sparse initial state: Given m < n obser- 
vations (m instances where 774 = 1), can we reconstruct 
the initial state Xq G M" exactly? Suppose that the re- 
ceiver observes the output of the system yt at the (stop- 
ping) time instances ti,t2,--' ,tm- Let the overall obser- 
vation matrix be the stacked observation matrices Ot,„ — 



servation be yT,„ ^ ^"''^ "'"^ "'^ 



and the overall ob- 



, where the subscript 



T„i emphasizes that only the observations at time instants 
Tm := {ti,t2,-'' ,tm} are available. Then yT,„ = Ot^Xq. 
In order to infer the initial state Xq from yx,,,, the columns 
of Ot„ have to be linearly independent, or equivalently, the 
null-space of the matrix Ot„ must be trivial. 

While the general setup has been well understood, the 
problem of our particular interest is the observability when 
the initial state Xq is sparse. The definition of a sparse vector 
is given as follows. 

Definition 1. Let B G M"^" be an orthonormal basis, i.e., B 
contains n orthonormal columns. A vector x G K" is if-sparse 
under B G M"""" if x = Bs for some s G R" with < 
K, where ||s||o gives the number of non-zero components in 
the vector s (||-||q is often referred to as the ^o-norm, even 
though it is not a well-defined norm). 

Our formulation appears to be new in the control theory 
literature, except for a paper [5] which considers a similar 
setting for observability properties of a stochastic model to 
be considered later in the paper. The differences between the 
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approaches in the stochastic setup are presented in Section |V] 
Another related work is [6l which designs control algorithms 
based on sparsity in the state, where compressive sensing tools 
are used to reconstruct the state for control purposes. 

III. Preliminaries and Compressive Sensing 

Compressive sensing is a signal processing technique that 
encodes a signal x of dimension n by computing a measure- 
ment vector y of dimension m <^ n via linear projections, 
i.e., y = ^x, where $ G jjmx?! referred to as the 
measurement matrix. In general, it is not possible to uniquely 
recover the unknown signal x using measurements y with 
reduced-dimensionality. Nevertheless, if the input signal is 
sufficiently sparse, exact reconstruction is possible. In this 
context, suppose that the unknown signal x G K" is at most 
if-sparse, i.e., that there are at most K nonzero entries in 
X. A naive reconstruction method is to search among all 
possible signals and find the sparsest one which is consistent 
with the linear measurements. This method requires only 
m = 2K random linear measurements, but finding the sparsest 
signal representation is an NP-hard problem. On the other 
hand, Donoho and Candes et. al. Q, H] demonstrated that 
reconstruction of x from y is a polynomial time problem if 
more measurements are taken. This is achieved by casting the 
reconstruction problem as an £i-minimization problem, i.e., 
min [|a;[|j subject to y = ^x, where \\x\\-^ = X^iLi 
denotes the ^i-norm of the vector x. It is a convex optimization 
problem and can be solved efficiently by linear program- 
ming (LP) techniques. The reconstruction complexity equals 
O (m^n'^/^) if the convex optimization problem is solved 
using interior point methods 19|. More recently, an iterative 
algorithm, termed subspace pursuit (SP), was proposed in- 
dependently in I.IOJ and IJlj . The corresponding computa- 
tional complexity is O [Km{n + K^)), which is significantly 
smaller than that of -minimization when K <C n. 

A sufficient and necessary condition for -minimization to 
perform exact reconstruction is the so called the null-space 
condition III2I . 

Theorem 2. If and only if for all w G M" such that = 0, 
and for all sets T C {1, 2, • • • , n} such that \T\ = K, there 
exists a constant c > 1 such that 

< \w'\ , (1) 

where T'^ — {1,2,- •• ,«} — T, then li-minimization recon- 
structs X exactly. 

A sufficient condition for both the £i-minimization and SP 
algorithms to perform exact reconstruction is based on the so 
called restricted isometry property (RIP) |8|. A matrix $ G 
gmxn jg g^jj satisfy the Restricted Isometry Property (RIP) 
with coefficients {K, 5) for A' < m, < (5 < 1, if for all index 
sets / C {1, • ■ • , n} such that |/| < K and for all q G RI-^I, 
one has 

{l-6)\\q\\l<\\^iq\\l<{l + 5)\\q\\l, 

where denotes the matrix formed by the columns of 4> with 
indices in /. The RIP parameter 5k is defined as the infimum 



of all parameters 5 for which the RIP holds. It was shown in 
10, ini, IHPI that both £i-minimization and SP algorithms 
lead to exact reconstructions of iiT-sparse signals if the matrix 
$ satisfies the RIP with a constant parameter, i.e., 5kK < cq 
where cq G (0, 1) and k G M+ are independent of K. We note 
that different algorithms may have different parameter values 
for cqS and fcs. Examples of random and deterministic RIP 
matrices can be found in fT?!, fSl, fTSl, fT6l. 

For later use, we also consider a particular class of the 
measurement matrices We will assume that G Sn,m (K) 
(that is, the rows of $ G R™^" are orthonormal) is isotropi- 
cally distributed (the definition of Sn.m (K) and the isotropic 
distribution on Sn.m (K) will be introduced in Section IV- Al l. 
Under this assumption, it has been shown in f\n\ that if 
the number of measurements satisfies ra > C ■ K\og [n/K) 
for some positive constant C, then with high probability 
(> 1 — e^"^ for some positive constant c) the ^i-minimization 
perfectly reconstructs the input unknown signal x. 

IV. The Deterministic Model 

This section characterizes the number of measurements 
needed for observability for different scenarios. We assume 
that ajQ is iiT-sparse under a basis B G Sn^n (K) and B is 
known in advance. Recall that observability generally requires 
that the observability matrix Ot,„ has full rank, i.e., at least 
n measurements should be collected. When a;o is sparse, 
the number of observations required for observability can be 
significantly reduced. 

We start with a special case where particular structures are 
imposed on A, B and C to reduce the number of required 
observations to 2K + 1. 

Proposition 3. Suppose that Xq is K-sparse under the natural 
basis B — I. Assume that A G R"^" is diagonal, and that 
all diagonal entries are nonzero and distinct. Let all of the 
entries of C ^M}^^ (dy = 1) be non-zero. Then Xq can be 
exactly reconstructed after exactly 2K + 1 measurements by 
algorithms with polynomial complexity in n. 

Proof: See Section rVLAl ■ 

Remark 4. The reconstruction relies on the Reed-Solomon de- 
coding method presented in ifTSl . Note that the reconstruction 
is not robust to noise and hence not very useful in practice. 

The following proposition considers the case where £i- 
minimization is used for reconstruction. We have further 
restrictions on the initial state and observation time. 

Proposition 5. Let all of the entries of C £ M^^" (dy = 1} 
be non-zero. Suppose CiXQj > for all i, where C = 
[ci, • • • , c„]. Further assume that A G R"^" is diagonal, and 
that all diagonal entries are nonzero. If the decoder receives 
2K + 1 successive observations at times t — 0, . . . , 2K, 
the decoder can reconstruct the initial state perfectly and 
the unique solution can be obtained by the solution of the 
linear program min||a;||i s.t. OtX = y, where Ot = 

'c^,{CAf ,{CA^^ff . 

Proof: See Section IVLBl ■ 
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We note that, one can relax the above to the case when the 
observations are periodic such that ^2 — = ^3 ~ ^2 = ••• = 
tm — tm-1, where 1,2, ... ,m are the observation times. 

In the following, we consider more general settings. 

Proposition 6. Suppose that A e R"^" is of Jordan canoni- 
cal form, all diagonal entries are nonzero, and the eigenvalues 
corresponding to different Jordan blocks are distinct. Let the 
entries of C G M^^" (dy — 1) be non-zero for all the 
leading components of Jordan blocks ( that is, for the first entry 
corresponding to a Jordan block). If the decoder receives m 
random observations, at random times T,„ — {ti, t2, ■ . ■ , im}, 

... ^[CA^"-f . Let 



let Ot^ = 

OT^{i) denote the i*"^ column of Ot^ for 1 < i < n. Define 
1 _ 1 



sup; 



|Ot„WI|2 



OT,At) 



|Ot„(j)||2 



Ot„(j)) <i. 



Then Xq can be exactly reconstructed after m measurements 
if: 

by algorithms with polynomial complexity in n. In particular, 
a linear program (LP) can be used to recover the initial state. 



Proof: See Section lyTCl ■ 

Remark 7. We recall that the observability of a linear sys- 
tem described by the pair [A, C) can be verified by the 
following criterion, known as the Hautus-Rosenbrock test: 
The pair is observable if and only if for all A G C, the 



matrix 



is full rank. Clearly, one needs 
to check the rank condition only for the eigenvalues of A. It 
is a consequence of the above that, if the component of C 
corresponding to the first entry of a Jordan block is zero, then 
the corresponding component cannot be recovered even with 
n successive observations, since this is a necessary condition 
for observability. 

A more general case is studied in the next proposition. 



Proposition 8. Given A G 



C G 



and T„ 



{ti, ■ ■ ■ ,tm}, if ^ — Ot^B satisfies the null-space condition 
(|7}, then ti-minimization min \\s\\^ s.t. yt — Ot„^Bs 
reconstructs s and Xq = Bs exactly. Suppose that $ satisfies 
the RIP with proper parameters, both Hi-minimization and SP 
algorithm leads to exact reconstruction of the initial state Xq. 

This proposition is a direct application of the results pre- 
sented in Section |III1 This result implies a protocol in which 
one keeps collecting available observations yt^ ,yt2, - ■ ■ until 
the null-space or RIP condition is satisfied. However, the 
computation complexity of verifying either of them generally 
increases exponentially with n. There are two approaches to 
avoid this extremely expensive computational cost. The first 
approach is reconstruction on the fly by trying to reconstruct 
the unknown initial state Xq every time when certain number 
of new observations are received; and continue this process 
until the reconstruction is good enough. In the second ap- 
proach, certain suboptimal but computationally more efficient 



conditions, for example, the incoherence condition, are em- 
ployed to judge whether current observations are sufficient for 
reconstruction. 

V. The Stochastic Model 

In this section, we discuss a stochastic model for the system 
matrices. One advantage of the stochastic model is that it 
helps in understanding more general cases that are difficult 
to analyze using the deterministic model. Examples include 
Theorem [12] and Corollary [14] Our analysis is based on the 
concept of rotational invariance, defined in Subsection IV-AI 
The intuition is that rotational invariance provides a rich 
structure to "mix" the non-zeros in the initial state and this 
"mixing" ensures an observability with significantly reduced 
number of measurements. 

During the preparation of this paper, we noticed that the 
stochastic model was also discussed in an independent work 
|[5|. The major differences between our approach and that in 
[31 are as follows. First, in 0, the observation matrix Cfc's 
are assumed to be random Gaussian matrices. In contrast, 
our model relies on rotationally invariant random matrices, 
which are much more general. Second, though the work |]5] 
is targeted for general state transition matrix A, the analysis 
and results best suit for the A matrices with concentrated 
spectrum, for example, unitary matrices. As a comparison, in 
our stochastic model, we separate the rotational invariance and 
the spectral property and hence the spectral property can be 
very much relaxed. 

A. The Isotropy of Random Matrices 

To define rotational invariance, we need to define the 
set of rotational matrices, often referred to as the Stiefel 
manifold. Formally, the Stiefel manifold Sn,k (K) is defined as 
Sn.k (R) = {[/ G M"^'= : U'^U = Ik}, where h is the fcx fc 
identity matrix. When n — k, a matrix in Sn^n (R) is an or- 
thonormal matrix and represents a rotation. A left rotation of a 
measurable set H C K™ ^ " under a given rotation represented 
by A G S,n,m is given by the set AH = {AH : H e H} C 
j^nxn sifnilarly defines the right rotation of TL given by 
T-LB for a given B G Sn,n- An invarianti isotropic probability 
measure /i/ |[T9l , li20l Sections 2 and 3] is defined by the 
property that for any measurable set M. C M™^" and rotation 
matrices A G 5„^„ (M) and B G Sk,k (R), M/ [M) = 
Hi [AM) = /i/ {J\AB) . The invariant probability on the 
Stiefel manifold is essentially the uniform probability measure, 
i.e., fij {{A G Sn^k (R) : ||^ — U\\p < e}) is independent of 
the choice of U e Sn,k (R)- 

The main results in this subsection are Lemmas [9] and 
[To] which show that an rotationally invariant random matrix 
admits rotationally invariant matrix products and decomposi- 
tions. These results are the key for proving results regarding 
observability in Subsection IV-BI 

Lemma 9. Let A G Sn,k (R) be isotropically distributed. Let 
B G Sn^n (R) be random. LetC^B A. Then C G Sn,k (R) 
is isotropically distributed and independent of B. 

Proof: In order to show that C is independent of B, it 
is sufficient to show that for given arbitrary B G Sn.n (R) 
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and arbitrary measurable set Ai C Sn.k (K), the conditional 
probability Pr (C G Ai\B) is independent of B. This can be 
verified by observing 

Pr(C e = Pr (A e B-^M\B) = Pr (A e JB-^TU) 

where (a) follows from the fact that A is independent of B, 
and (&) comes from the facts that A is isotropically distributed 
and that B G 5„,„ (R) and hence B^ = B'^ e 5„,„ (M). 
This proves the lemma. ■ 
Let H S M"^" be a standard Gaussian random matrix, i.e., 
the entries of H are independent and identically distributed 
Gaussian random variables with zero mean and unit variance. 
Consider the Jordan matrix decomposition H = PJP^, 
where J is often referred to as the Jordan normal form of H. 
Let P — UpApVp be the singular value decomposition of 
P, where Ap is the diagonal matrix composed of singular 
values of P. Then P^ = VpAp^Uj,. The following 
lemma states that the orthogonal matrix Up is isotropically 
distributed. 

Lemma 10. Let H G K"^" be a standard Gaussian random 
matrix, let H — PJP^ be the corresponding Jordan matrix 
decomposition, and let P = UpApYp be the singular value 
decomposition of P. Then Up G Sn.n (K) is isotropically 
distributed and independent of J, Ap and Vp. 

Proof: According to the statement of this lemma, is a 
standard Gaussian random matrix. Hence, the distribution of 
H is left and right rotationally invariant iBTI . Il22] pg. 37]. That 
is, for measurable sets % C R"^" and arbitrary Q G Sn.n (R), 
Pr {H en) ^ Pr {H G QH) ^ Ft {H £ HQ) , and there- 
fore, Pr {Hen) = Pr {H G QJiQ^) . To simplify the 
notation, let H = UpBUp, where B = ApVp JVpAp^. 
Let 14 C Sn.n (R) be an arbitrary measurable set of Up . Let 
Pr (U) be the probability measure of L'^p induced from the 
probability measure of H. 

The isotropics of Up means that Pr {Up eU) = 
Pi {Up e QU) for an arbitrarily given Q G iS„.„ (M). 
To reach this end, note that Pr {Up eU) = 
Pr {H : 3Up G U s.t. H = UpBUj,] , and 

Pr {Up G QU) = Pr {if' : 3U'p G QU s.t. H' = U'pBU'^ 
= Pr {H' : 3Up G U s.t. H' = Q [UpBU'Pj Q^] . 

In other words, for any H that induces a Up G U, QHQ^ 
induces a Up G QU, and vice versa. Because we have shown 
Pr {H en) = Pr {H G QnQ^), we conclude that L^p is 
isotropically distributed. Furthermore, the above argument also 
suggests that Up is independent of the matrix B, therefore 
independent of J, Ap and Vp. This lemma is proved. ■ 

Remark 11. Although Lemma [TO] only treats standard Gaus- 
sian random matrices, the same result holds for general 
random matrix ensembles whose distributions are left and right 
rotationally invariant; The proof of Lemma [TO] can be carried 
over 

B. Results for Stochastic Models 

Recall that a general linear system is observable if and 
only if the observabihty matrix Ot„ has full row rank. One 



may expect that the row rank of Ot,„ still indicates the 
observability of a linear system with sparse initial state and 
partial observations. The next theorem confirms the intimate 
(gelation between the row rank and the observability. The 
"diJfefeici iMjiveeiVbltA^^sults and the standard results is that 
the required minimum rank is much smaller than the signal 
dimension n in our setting. 

Theorem 12. Suppose that A e R"""" and C G R'*«><" are 
independent drawn from a random matrix ensemble whose 
distribution is left and right rotationally invariant. Let r 
be the row rank of the overall observation matrix Ot,„. If 
r > O (Xlog-^), then the ti-minimization method perfectly 
reconstructs from yt — OfXo (where we write t — Tm for 
notational convenience) with high probability (at least 1— e""'^ 
for some positive constant c independent of n and r). 

The proof of Theorem [12] rests on the following Lemma. 

Lemma 13. Assume the same set-ups as in Theorem \12\ and 
let t = T„i for notational convenience. Let Ot — UtAtV^^ 
be the corresponding singular value decomposition, where 
Ut e (R), Vt e Sn,n (R) are the left and right 

singular vector matrices respectively. Then Vt is isotropically 
distributed and independent of Ut and At. 

While Lemma [13] is proved in Section IVI-DI the detailed 
proof of Theorem[T2]is presented in Section [Vl-EI The detailed 
reconstruction procedure using -minimization is explicitly 
presented in the proof. 

The next corollary presents a special case where the diago- 
nal form is involved. 

Corollary 14. Suppose that A G M"""" ant/ C G R^""" 
(dy = 1) are independent drawn from random matrix ensem- 
bles whose distribution is left and right rotationally invariant. 
Suppose that the Jordan normal form J = P~^AP is diag- 
onal with distinct diagonal entries with probability one. Then 
after m > 0{^K\og^) measurements, the £i-minimization 
method perfectly reconstructs Xq with high probability ( at least 
1 — e^""^ for some positive constant c). 

Proof: See Section [VLFI ■ 
Acute readers may ask whether there exists a random matrix 
ensemble such that the random sample A satisfies the required 
conditions in Corollary [14] In fact, if A = HH^ where 
H e R"^" is a standard Gaussian random matrix, then all the 
conditions required for A hold. This corollary guarantees that 
blindly collecting ■m> O [K log observations is sufficient 
for perfect reconstruction with high probability. 

VI. Proofs 

A. Proof of Proposition \3\ 

Let A = diag (A) where A = [Ai,A2,-- - ,A„]"^ is the 
vector containing the diagonal entries of A. Let c; de- 
note the i*^ entry of the row vector C. Then, CA*' = 
[ci A*' , C2A*' , • • • , CnXll] = [X{' , A*' , • ■ • , A^-j diag (C) , 
where diag (C) is the diagonal matrix whose i*"^ diagonal 
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entry is Hence, 



2 



A^™ An" 



diag (C) xq. 



Since all the entries of C are non-zero, diag (C) cco is if- 
sparse under the natural basis. On the other hand, since 
Ai, A2, • • • , A„ are all distinct, the matrix At is a truncation 
of the full rank Vandermonde matrix ll23l . Now according to 
the Reed-Solomon decoding method presented in ifTSl and 
the corresponding proof, as long as m > 2K + 1, one can 
exactly reconstruct diag (C) Xq and therefore Xq from yt 
with the number of algebraic operations polynomial in n. This 
proposition is therefore proved. 

B. Proof of Proposition O 

We first consider the case when A is diagonal. Since 
A is diagonal, it is of the form A — diag([Ai,-- - ,A„]). 
Furthermore, assume that C = [ci , • • • , c„] is a row vector 
With TO many successive observations, we have a linear system 
described by 



yt = 



1 

Ai 



1 

A2 



1 

An, 



diag([ci,--- ,c„])a;o. 



M 



Define z e ] 
corresponding 



such that Zi — CiXQ i > 0. Then the 
-minimization problem becomes 



|2;||j^ subject to yt — Mz. 



(2) 



Once we solve the above optimization prolem, it is clear that 
xo,i = Zt/ (A*'ci) where ti = 0. 

For this case, we first show that the ^i-minimization has 
a unique solution. Via duality theory, for a constrained min- 
imization problem of a convex function with an equality 
constraint, the minimization has a unique solution if one 
can find a Lagrange multiplier (in the dual space) for which 
the Lagrangian at the solution is locally stationary. More 
specifically, let A/: ^ be the fi^ column of the matrix M. 
Let ,iK be the indices of the nonzero entries of Xq. 

Clearly, ii, • • • , ia' are also the indices of the nonzero entries 
of the corresponding z — diag ( [A^^ ci , • • • , Xl^Cn]) xq. If 
there exists a vector g G M™ so that 



(g,M,,) = l 
(9,M,,)<1 



Vi e {ii,«2, • • • 

Vi ^ {ii,«2, • • • ' 



then the duality theory implies that the optimization problem 
in (|2]i has a unique minimizer that is JC-sparse and has nonzero 
entries at indices ii, - ■ ■ , ik- 

In the following we construct a subdifferential which is 
essentially what Fuchs constructed in ll24l . Consider a poly- 
nomial in A of the form P {X) — X^^^yXi^. — A)^ = 



a^y?^ + ax}?^^^ H h a2K- It is clear that 

Vi e {ii,i2, ■ • ■ -.ik) 
Vi ^ {ji,i2, ■ • ■ , j_ftr} ' 
where the inequality holds since Ai's are distinct. Let / G 




It can be 

m-11 ^\ 



P (Ai) . Now, define a vector g G M' 



as 



verified that the inner product [l,A 

g = [1,0,0,--- ,0]^-/. Then (g, [l,A„..- ,A 
1 — P (Ai). The vector g is the desired Lagrange vector Hence 
the optimization problem (|2]i has a unique minimizer 

What now needs to be shown is that there is a unique 
solution to the original problem under the Zq constraint. In 
other words, we wish to show that there is a unique X— sparse 
z such that yt = Mz. Now, let there be another iiT— sparse 
solution z' . Then, M{z — z') = 0. But, since any 2K columns 
of the Vandermonde matrix M are linearly independent, z — z' 
has to be the zero vector. Hence, this ensures the the found £1 
solution is the sought Iq solution. o 

C. Proof of Proposition [6| 

We now discuss the result for a Jordan matrix A. Observe 
that 



J 



Ai 





1 

Ai 





1 

Ai 



X- 





A? 




(^)Al' 
X- 



n-2-1 
1 



Thus, it 

diag{Xi, . 

M = 



follows that if A is of the diagonal form: 
. . , A„), the random observation matrix writes as: 



ciA*i ci<iA*i^i +C2A*^ 



Cl^mAi 



C2A1'' 



If ci is non-zero, and the entries corresponding to lead- 
ing entries of Jordan blocks are non-zero, the columns of 
the matrix become linearly independent. By multiplying the 
initial condition with a diagonal matrix, we can normalize the 
columns such that the I2 norm of each column is equal to 1. 

The rest of the proof now follows from Theorem 3 of [25]. 



D. Proof of Lemma \T3\ 

Consider the Jordan decomposition A = PJP^ and the 
singular value decomposition P = UpApVp. It is clear that 
VpAp^Up. For notational compactness, let A = 
T rT/^„A-i thot A - UpAUp. It is elementary to 



P-i = VpAp^Ul 
ApVp JVpA^ so that A 
verify that A*' = UpA^'Up. Hence, 







cUpA'^ 




_ CA*- _ 




_ CUpA'^^ 





Ot 



We shall show that Up is independent of both A and 
CUp. Since A is left and right rotation-invariantly dis- 
tributed, according to Remark [TT] Up is isotropically dis- 
tributed and independent of A. In order to show that Up 
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is independent of CUp, we resort to the singular value a Jordan decomposition. CoroUarv [T4] holds if 
decomposition C = Uc-^cV^ . Since C is right rotation- 
invariantly distributed, Vc is isotropically distributed. Thus 
:= V^Up is isotropically distributed and independent of 
Up according to Lemma |9] As a result, CUp ~ Uc-^cV^c 



is independent of Up. Write Ot 
CUpA'A^ JcUpA*' 



1 T 



OtUj,, where Ot 
Since Up is inde- 



pendent of both A and CUp, Up is independent of Ot- 
Write the singular value decompositions of Ot and Ot as 
Ot = J7tAtF/ and Ot - iJtAtV/. Clearly = UpVf 
Since i7p is isotropically distributed and independent of Ot, 
Vt = UpVt is isotropically distributed and independent of 
both At and Ut according to Lemma |9] This completes the 
proof. 

E. Proof of Theorem [72] 

We transfer the considered reconstruction problem 
to the standard compressive sensing reconstruction. Let 
Ai, A2, • • • ,Xr be the r non-zero singular values of Ot and 
A = [Ai, A2, • • • , A,.]^. The singular value decomposition of 
Ot can be written in the form 



Ot^Ut 



diag (A) 




where diag (A) is the diagonal matrix generated from A. Note 
that 



Uivt 



diag (A) 




V/ xo. 



The 7' + 1, r + 2, • • • , TO entries of U^ yt are zeros: they 
do not carry any information about Xq. Define ijt be the 
vector containing the first r entries of U^yt- We have 
yt ~ [ diag (A) ] Vt^Xo and therefore 

diag (A)-i yt=[lr ] V^^xo = [ ] Vt^Bs, 

(3) 

where Ir is the r x r identity matrix. 

The unknown s (/^-sparse) can be reconstructed by £1- 
minimization with high probability. Since Vt is isotropi- 
cally distributed and independent of B, the matrix B is 
isotropically distributed. The matrix ([ -Tr ] B^ G 
Sn,r (R), containing the first r rows of V^B as columns, 
is therefore isotropically distributed. Provided that r > 
O {K log (n/K)), the unknown signal s can be exactly recon- 
structed from diag (A) ^ yt via £i-minimization |17|. Theo- 
rem [12] is proved. 

Remark 15. The reconstruction procedure involves singular 
value decomposition, matrix production, and £1 -minimization. 
The numbers of algebraic operations required for all these 
steps are polynomial in n. Hence, the complexity of the whole 
reconstruction process is polynomial in n. 



F. Proof of Corollary \14\ 

Since both A and C are left and right rotation-invariantly 
distributed. Theorem [T2] can be applied. Let A = PJP be 







CPJ*^ 


CA*^ 




CPJ*^ 


_ CA*^" _ 




CPJ'^ 



Ot = 



is full row ranked with probability one, i.e., rank (Ot) ~ m > 
O (-ftr log with probability one. 

Suppose that the Jordan normal form J = P^AP is 
diagonal. Denote the j"^ diagonal entry of J by Ji. Note that 

CP J*' = [{CP), Jl%iCP), J*% • • ■ , (CP)„ J*-] 
= [J^,J*%-- - ,J*-]diag(CP), 

where diag (CP) is the diagonal matrix generated from the 
row vector CP. Define 

7*1 



iv.t 



jti 

Jt2 



Jtl 
Jt2 



7t™ Jt 



7* 



P 
'J ^ 



Then Ot = J\/,tdiag (CP) P ^. Note that Jy is composed 
of TO rows of the Vandemonde matrix 



IV = 



1 

Ji 



1 

J2 



1 



jm 



r 
0/9 



J11 



The matrix Jv,t has full row rank. By definition of P, P^ 
has full rank as well. Therefore, Ot has full row rank if and 
only if CP does not contain any zero entries. 

The fact that the row vector CP does not contain any zero 
entries holds with probability one. This fact will be established 
by the isotropy of C. Let P.j denote the j*'^ column of P. 
Since P is full rank, Pj ^ for all j = 1, 2, • • • , n. By 
assumption, C is isotropically distributed. This implies that 
CP. j 7^ with probability one 120|. CP is composed of 
finite columns. It follows that with probability one, no entry 
of CP is zero. 

So far, we have proved that Of has full row rank with 
probability one if the Jordan normal form J ~ P^AP is 
diagonal. Note that by assumption, the Jordan normal form 
is diagonal with probability one. We have rank (Ot) = to > 
O (TiTlog ■^) with probability one. This proves this corollary. 

VII. Concluding Remarks 

In this paper we obtained sufficiency conditions for the 
observability of a linear system where the number of non-zeros 
in the initial states is known to be less than the dimensionality 
of the system. The discussion also applies to the case if certain 
elements have known values and we wish to reconstruct the 
unknown values. 

Two models were included; one is for a deterministic model 
and the other for a stochastic model. We observed that a much 
lower number of observations (even when the observations are 
randomly picked) can be used to recover the initial condition. 
Furthermore, this can be done by a linear or quadratic program. 
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An interesting extension of this problem is for the case when 
there are some non-zero terms but terms which are known to 
have small magnitude, that is a robust formulation of initial 
condition recovery when the disturbance is an I2 ball of small 
radius. 

Compressive sensing offers new directions for design of 
information structures in networked control systems. Recent 
work |6| lays out designs based on compressive sensing 
principles for such systems. We believe there will be further 
results specific to control systems, in particular on the inherent 
interaction between estimation and control in decentralized 
control systems. 
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